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In  1962  Hajek  proposed  a  test  for  location  which  was  uniformly 
asymptotically  fully  efficient  over  a  large  class  of  distributions. 

Van  Eeden  subsequently  derived  the  asymptotic  theory  for  the  corre¬ 
sponding  R-estimate.  Many  authors  have  expressed  reservations  with 
regard  to  the  small  sample  performance  to  be  expected  from  this 
approach.  In  contrast  to  the  methods  of  Hajek  and  Van  Eeden,  the 
procedure  proposed  here  uses  the  entire  data  set  to  estimate  the  score 
function  of  the  locally  most  powerful  rank  test.  Using  nearest  neigh¬ 
bor  density  estimation  methods,  an  estimate  of  the  score  function  for 
the  asymptotically  most  powerful  grouped  rank  test  is  constructed. 

This  function  is  itself  a  step  function  approximation  to  the  score 
function  for  the  locally  most  powerful  rank  test.  Large  sample  dis¬ 
tribution  and  optimality  results  are  obtained  for  both  the  adaptive 
rank  test  and  the  corresponding  R-estimate  of  location.  Small  sample 
monte-carlo  results  are  provided. 
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1.  INTRODUCTION 


In  1955,  Stein  [27]  suggested  that  full  asymptotic  efficiency 
in  testing  and  estimation  for  a  large  class  of  distributions  could  be 
obtained  by  utilizing  the  information  about  the  density  contained  in 
the  data.  This  idea  was  further  developed  by  Hajek  [10]  and 
Van  Eeden  [31].  Hajek  demonstrated  that  by  estimating  the  score 
function  of  the  locally  most  powerful  rank  test,  f'(F  ^(u))/f(F  ^(u)) 
an  asymptotically  most  powerful  test  of  location  could  be 
constructed.  Using  the  methods  of  Hodges  and  Lehmann  [12],  Van  Eeden 
utilized  Hajek's  test  to  form  an  asymptotically  fully  efficient  esti¬ 
mate  of  location.  These  results  are  interesting  theoretically;  but 
many  authors,  e.g.,  Hajek  [10],  Switzer  [29],  Huber  [15],  Wesley 
[32],  and  Hogg  [13],  have  expressed  reservations  concerning  the  small 
sample  properties  achievable  through  this  approach.  There  are  seri¬ 
ous  questions  about  the  applicability  of  the  results  of  Hajek  and 
Van  Eeden  to  real  world  data.  In  their  approach  the  sample  is  split 
into  two  parts  which  are  used  for  separate  purposes.  The  first  part 
is  a  vanishingly  small  fraction  of  the  data  which  is  used  to  estimate 
the  score  function.  The  second  part  of  the  data  determines  the  ranks 
to  be  used  with  the  coefficients  based  on  the  first  part.  This 
approach  raises  doubts  about  the  stability  of  the  estimate  of  the 
score  function  for  small  samples.  Beran  [5]  has  partially  avoided 
this  difficulty,  but  his  proposed  estimate  of  the  score  function  also 
appears  likely  to  perform  poorly  for  small  samples.  No  monte  carlo 


1 


results  are  available  concerning  the  small  sample  performance  of  any 
of  the  above-mentioned  adaptive  estimates  of  location. 

The  methods  we  employ  differ  from  those  of  Hajek  and 
Van  Eeden  mainly  in  two  respects.  First,  the  entire  data  set  is  used 
to  estimate  the  score  function  and  to  perform  the  resulting  test  of 
location.  Secondly,  by  the  use  of  density  estimation  techniques,  an 
estimate  of  the  score  function  of  the  asymptotically  most  powerful 
grouped  rank  test  is  constructed.  This  function  is  itself  a  step 
function  approximation  to  the  score  function  of  the  locally  most 
powerful  rank  test;  and  in  approximating  the  score  function  in  this 
way,  it  is  hoped  to  gain  small  sample  stability  of  the  estimate. 

In  the  manner  of  Hodges  and  Lehmann  [12],  and  following 
Van  Eeden,  it  is  possible  to  form  an  estimate  of  location  from  the 
resulting  rank  test,  i.e.,  an  R-estimate.  It  is  hoped  that  if  this 
two-sample  rank  test  adapts  well  to  the  data,  the  same  should  be  true 
of  the  resulting  estimate  of  location.  Large  sample  optimality 
results  are  shown  for  both  the  proposed  adaptive  rank  test  and  the 
associated  estimate  of  location.  Finally,  small  sample  monte  carlo 
results  are  provided. 
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2.  DEVELOPMENT  AND  HISTORY  OF  ADAPTIVE  RANK  TESTS 


2.1.  The  Testing  Framework 

Assume  that  we  have  two  samples  (x^,X2 , . . . ,xn) ,  and 
(yi,y2  ’  ’  *  *  ,yn)  '  Let  Xi  be  Yj  i.i.d.  F(x-d) ,  F  symmetric, 

with  density  f.  Let  N  =  2n.  Also,  let  CD  =  , . . .  ,co  )  be  defined 

by  i 

{0  if  the  ith  order  statistics  of  the  combined  sample  is  an  x 
1  otherwise 

Then  a  linear  rank  test  is  a  statistic  of  the  form  c(i/N+l)OL, 

where  {c(i/N+l)l  is  a  vector  of  constants  determined  by  some  function 
c(  ).  A  grouped  rank  test  is  a  statistic  with  the  above  form  and  the 
added  constraint  that  the  number  of  different  c(i/N+l)  be  finite, 
i.e.,  the  range  of  c(  )  is  a  finite  set.  Let  us  assume,  as  above, 
that  the  second  sample  Y  is  d  units  to  the  right  of  the  first  sample, 
X.  Then  to  see  the  co-vector  more  clearly  one  could  write  down  the 
order  statistics  of  the  combined  sample  and  replace  every  value  with 
0,  or  1,  depending  on  which  sample  that  data  point  came  from.  A 
typical  co-vector  with  the  above  shift  might  look  like 
00000101100101110111.  It  is  easy  to  imagine  the  nature  of  a  desir¬ 
able  vector  of  constants  from  the  chain  of  0's  and  l's.  One  would 
like  a  statistic  which  is  large  for  positive  changes  in  the  location 
of  the  second  sample.  The  constants  for  the  intermediate  co^'s  are 
relatively  unimportant,  since  the  0's  and  l's  may  be  equally  repre¬ 
sented  there.  At  the  extremes,  however,  the  pattern  of  0's  and  l's 
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depends  heavily  on  the  size  of  the  tails  of  the  distribution.  For  a 
given  displacement  a  typical  03-vector  for  a  short-tailed  distribution 
might  look  like  00000001001110111111,  whereas  a  typical  03-vector  for 
a  long- tailed  distribution  might  look  like  10100001001110111001.  The 
coefficients  should  take  into  account  this  dependence  on  the  size  of 
the  tails  by  attempting  to  place  more  weight  on  the  extremes  of  the 
&)-vectors  for  distributions  with  short  tails  than  for  those  with  long 
tails . 

It  is  well  known,  Hajek  [10],  that  for  testing  the  hypothesis 
that  d>0  versus  d  =  0,  the  locally  most  powerful  rank  test  (LMPRT) 
has  c(i/N+l)  =  J(i/N+1))  =  -f * (F-1(I/N+1) ) /f (F_1(i/N+1) ) .  We  shall 
consider  the  possibility  of  estimating  J(u)  from  the  data  and  using 
the  resulting  estimate  in  a  rank  test,  i.e.,  adapting.  Such  a  test 
would  hopefully  have  large-sample  optimality  properties  as  well  as 
good  small-sample  performance  for  a  large  family  of  distributions, 
and  we  will  verify  that  such  is  the  case  for  the  proposed  test.  When 
one  is  unsure  of  the  actual  distribution  of  the  data,  one  could  con¬ 
sider  using  such  a  test,  instead  of,  say,  the  normal  scores  test  or 
the  sign  test,  (two-sample  tests  which  are  asymptotically  most  power¬ 
ful  for  testing  d>0  against  d  =  0  for  data  which  has,  respectively,  a 
normal  or  a  Laplace  distribution) . 

2.2.  Density  Estimation 

To  estimate  functions  which  are  approximations  to  the  score 
function,  it  is  obvious  that  estimates  of  the  density,  at  least,  and 
possibly  its  derivative  need  to  be  considered.  There  are  two  main 
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types  of  density  estimates  currently  in  the  literature,  kernel  esti¬ 
mates  and  nearest  neighbor  density  estimates.  Kernel  estimates  were 
initially  developed  by  Rosenblass  [25]  and  Parzen  [22].  Nearest 
neighbor  density  estimates  were  developed  by  Fix  and  Hodges  [8],  and 
more  formally  by  Loftsgaarden  and  Quesenberry  [19]. 

In  density  estimation  we  are  actually  estimating  an  integral 
of  the  density  over  an  interval  containing  the  point  at  which  we  wish 
to  determine  the  density,  and  then  dividing  by  the  length  of  the 
interval.  This  estimates  an  average  of  the  density  in  a  neighborhood 
of  the  point.  The  differences  in  the  two  procedures  lie  in  the 
determination  of  the  interval.  In  kernel  estimation,  we  (essentially) 
preset  a  band-width  and  count  the  number  of  points  in  that  band  around 
the  point.  In  nearest  neighbor  estimation,  we  decide  on  how  many  data 
points  we  would  like  to  use  for  each  point,  here  designated  as  k(N) 
where  N  is  the  size  of  the  sample,  and  determine  the  smallest  symme¬ 
tric  band  around  the  point  which  encloses  the  required  number  of  data 
points . 

We  will  concentrate  on  nearest  neighbor  estimates  in  the  hope 
of  obtaining  stable  estimation  in  the  tails  where  the  density  of 
observed  points  is  low.  Better  estimation  in  the  tails  will  corre¬ 
spond  to  better  estimation  of  the  score  function  J(u)  for  u  near  zero 
and  one.  This  in  turn  should  lead  to  better  estimation  of  the  coef¬ 
ficients  of  the  rank  test  for  go's  in  the  crucial  regions  as  described 
in  the  previous  section.  We  feel  that  nearest  neighbor  estimates 
should  perform  better  than  kernel  estimates  for  small  samples  because 
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the  size  of  the  window  is  adapted  to  the  number  of  data  points  in  the 
area,  instead  of  being  fixed  as  in  kernel  estimation. 

The  results  that  follow  only  use  properties  of  density  func¬ 
tion  estimates  that  are  commonly  held  by  both  kernel  estimates  and 
nearest  neighbor  estimates.  Both  types  of  estimate  are  location 
invariant,  clearly;  since  if  we  add  a  constant  c  to  the  data  and 
estimate  the  density  at  x+c,  we  will  get  the  same  value  as  the  den¬ 
sity  estimate  at  x  of  the  untransformed  data.  Both  estimates  are 
continuous.  Also,  both  estimates  converge  uniformly  with  probabil¬ 
ity  one.  We  add  that  under  general  conditions  which  will  not  be 
described  here,  various  convergence  results  are  shown  to  hold  simul¬ 
taneously  for  both  types  of  estimate  in  the  work  of  Moore  and  Yackel 
[20].  Therefore,  any  future  investigators  could  certainly  study  the 
procedures  to  be  described  here  with  kernel  estimates  replacing  near¬ 
est  neighbor  estimates  everywhere. 


2.3.  An  Adaptive  Rank  Test  Coefficient 

It  is  our  objective  to  construct  an  estimate  of  J(u)  which  is 
well  behaved  for  small  samples.  We  recall  that 

J(u)  =  -f’(F  ^(u))/f(F  1(u)) .  There  is  certainly  no  difficulty  in 

''--1  _i 

obtaining  an  estimate  F  of  F  The  most  commonly  used  estimate  is 

A_1 

F  (u)  =  X([nu]+^,  where  represents  the  i-th  order  statistic 

from  a  sample  of  n  i.i.d.  X's.  For  this  estimate,  under  the  condi¬ 
tion  of  absolute  continuity,  we  have  convergence  with  probability  one 
by  proposition  (i),  page  423,  Rao  [24].  There  exist  in  the  litera¬ 
ture  several  estimates  of  both  fT  and  f,  but  these  tend  not  to  be 
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well-behaved  for  small  samples,  especially  the  estimates  of  f.  It 
probably  is  not  a  good  idea  to  divide  by  these  unstable  estimates  so 
that  direct  substitution  in  the  formula  for  J(u)  will  very  likely  do 
poorly.  However,  noticing  that 

J(u)  =  ■£  -  f (F_1(u))  -  -  lim  .ICF-hu-fh))  -  f(F-1(u-h)) 
dU  h+0  2h 

we  might  consider  using  J(u)  =  -[f(G(u+b)  -  f (G(u-b)) ]/2b,  for  some 

A  —1 
fixed  b,  where  f  is  some  estimate  of  the  density,  and  G(u)  =  F  (u)  . 

Pursuing  this  notion,  let  us  proceed  as  follows: 

Let  0  <  <  ...  <  Aj. ^  =  1  be  any  k  fractiles  (i.e., 

partitions  of  [0,1]),  and  let  P(i)  =  [NA  ]  +  1.  Then  define  an  esti¬ 
mate  of  J(u)  by 

Ln(u)  =  -  [f(G(A.))  -  f(G(Ai_1))]/(A.-Ai_1)  , 

for  A.  i  <  u  <  A . . 
l-l  l 

This  estimate  is  now  identical  in  functional  form  to  the  score 
function  of  a  test  which  was  proposed  by  Gastwirth  [8].  In  this 
paper,  Gastwirth  showed  that  the  test  with  score  function 

L(u)  =  [f  (G(A_l_1))  -  f(G(A±))  ]/(Ai- A±_1) 

generates  the  asymptotically  most  powerful  grouped  rank  test  based  on 
the  fractiles  [A^ ]  when  the  density  underlying  the  data  is  f ,  under 
certain  regularity  conditions  on  f.  We  remark  that  a  grouped  rank 

p 

test  statistic  is  a  statistic  of  the  form  c.  co.). 

j=l  j  i=P(j-l)  i 

This  collects  the  co-vectors'  contribution  to  the  test  statistic  into 
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k+1  groups  and  gives  each  observation  in  any  group  equal  weight. 

Thus  we  may  consider  the  rank  test  using  the  proposed  estimate 
of  J(u)  as  an  adaptative  asymptotically  most  powerful  grouped  rank 
test. 

Hogg  [13]  suggested  an  approach  similar  to  this  in  his  review 
of  adaptive  robust  procedures.  Quoting,  "but  1  can  imagine  J(u) 
being  estimated  by  a  curve  constructed  from  a  few  line  segments." 

The  estimate  proposed  here  may  be  regarded  as  a  development  of  this 
suggestion.  Also,  after  the  present  work  was  begun,  Parzen  [23] 
noted  the  equality  of  the  score  function  to  a  single  derivative  of 
f(G(u))  with  respect  to  u  and  mentioned  the  possibility  of  using  this 
fact  in  analyzing  data. 


2.4.  Comparison  With  Previous  Work 

Hajek  [1962],  considered  the  following  procedure.  The  X  and  Y 
samples  are  randomly  split  into  two  sets  each;  W^  =  (y^, . . . ,yr  ), 


n 


,x  ) .  Let 


n 


Zn  (xi’*”’xr  )»  Un  “  (yr  +1*  *',,yn')’  Vn  (xr  +1* 
n  n  n 

r  -*■  00  and  r  /n  -»■  0,  as  n  ->  00 .  Estimate  J(u)  using  W  and  Z  and 
n  n  n.  n 

average  the  two  estimates.  Then  determine  the  co-vector  used  in  the 

rank  test  from  the  two  remaining  samples,  Un  and  V^.  Also,  let 

[0  =  h  <  h  .<...<  h  =  r  ]  be  a  sequence  of  q+  1  -  tuples  of 
n,0  n,l  n,Si  n  ^  tx 

integers,  and  let  y^  ^  be  the  order  statistics  of  W^.  Finally,  let 
c  be  a  sequence  of  constants  and  suppress  the  dependence  of  the 


n 


sample  size  on  r  and  c  by  denoting  them  by  r  and  c.  Then  the  esti¬ 


n 


n 


mate  A»>  is  of  the  following  form: 
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(i/N-2r+l,W  )  = 


(|)r  1/30{l/[y(h  )  -y(h  .  )]_[y(h  .  ,  1 )  ~  y(h  .  .) 

n,j+c  n,j-c  n,j+c+l  n,j-c+l 


]} 


for  h  . /r  <  i/n-2r+l<  h  .,,/r  ,  3  =  2 . .  ,  i=l,...,n-2r 

n,j  n,j+l  n 


and  =  0  otherwise 

The  definition  is  completed  by  taking  J^(u)  to  be  constant  on  the 

intervals  [(i-l)/(n-2r) ,  i/(n-2r)],  i =  1, . . . ,n-2r.  Form  a  similar 

estimate  based  on  (z  }  and  average  the  two  estimates.  Van  Eeden's 

n 

procedure  differs  from  this  by  monotonizing  the  function  J^Cu)  and 
both  Hajek  and  Van  Eeden  subtract  the  mean  of  the  JII(i/n-2r+l) ' s  from 
each  estimated  coefficient.  The  resulting  two  sample  test  can  be 
written  in  the  following  form: 

n-r 

T  (U,V)  =  E 
n  -i 

i=l 

where  for  i  =  l,...,n-K  ,  0)  .is  the  indicator  of  the  i-th  order  sta¬ 
ll  n,  1 

fistic  of  the  combined  sample  {U,V}. 

We  now  study  the  differences  between  the  estimator  proposed 
here  and  the  estimator  developed  by  Hajek.  There  are  three  basic 
differences,  two  of  which  relate  directly  to  the  estimate  of  the 
score  function,  and  a  third  (which  will  be  discussed  later)  which 
deals  with  the  construction  of  the  R-estimate.  To  facilitate  the 
comparison  we  will  explicitly  write  the  estimate  proposed  here  as  a 
function  of  the  data.  (We  will  omit  for  now  the  fact  that  both 


J^(i/n-2r+l)co 
v  n,i 


9 


estimates  are  averages  of  independent  estimates  from  the  two 


samples.) 


Letting  Dk(n)  (a)  be  the  distance  from  a  to  its  k(n)-th 


nearest  neighbor  of  X,  we  have 


Ln(i/n+l)  =  [fCGU^))  -f(G(Xj))]/(Xj-Xj_1) 

=  ^(X(P(j_i))-?(X(P(j»)]/(x.-xj_i) 
[k(n)/2n  Dk(n)(X(p(j_1);)) 
-k(n)/2nDk(n)(X(p(.)))]/(X.-X._i) 
=  B([l/(X(p(j_1)+n(1)) 

-  -x(p(j)_n(4)))]) 


for  P(j-l)  <  i  <  P(j),  where  n(l)  and  n(2)  (and  similarly  n(3)  and 

n(4))  denote  the  number  of  data  points  from  X,„,.  to  the  left  and 

(P(j-l)) 

to  the  right,  which  are  passed  in  order  to  symmetrically  enclose  k(n) 
data  points.  Here  B  is  a  constant  depending  only  on  the  sample  size. 
Note  that  with  probability  one,  one  of  the  end  points  of  the  interval 

(X(P(j-l))-n(2))  ’  X(P(j-l)+n(l))) 
is  not  a  point  of  the  data  set.  (See  Figure  1.) 
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®k(n)(X(P(i))>‘ 


■< - D, 


'k(n)  (X(P(i)))" 


(P(i)-n(2» 


(PCD) 


(P(D+n(l)> 


We  recall  that 


J**(i/N-2r+l,W  ) 


Figure  1 


=  s{i/[y(  -y  )]-1/[y(h  >-y(h  )]} 

^  n,j+c;  n,j-c;  ^  n,j+c+l'  nj-c+l' 


for  some  constant  S. 

The  similarities  and  differences  between  the  proposed  coeffi¬ 
cients  and  the  coefficients  of  Hajek’s  test  can  now  be  explained. 
Ignoring  the  leading  constants,  we  concentrate  on  the  four  order 
statistics  appearing  in  each  of  the  estimates. 

We  have  from  Hajek  that  h  .  -h  ./r  ~  r~^' ^ .  Thus: 

n,j+l  n,j 

1)  The  distances  between  the  points  at  which  J^Cu)  changes 
approach  zero  asymptotically. 

2)  The  number  of  co^  that  have  the  same  coefficient  in  Hajek’s 
test  is  asymptotically  (hn  _.+^-hn  ^)/(l/n-2r+l)  which 
approaches  infinity. 

We  see  that  the  Hajek  test  is  similar  to  a  grouped  rank  test  with  the 

1/6 

number  of  groups  slowly  increasing  to  infinity,  at  the  rate  r  '  . 

The  following  similarities  and  differences  are  now  apparent : 
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1)  In  Hajek's  test  the  score  function  is  estimated  at  asympto¬ 
tically  infinitely  many  points  and  will  therefore  be  asymptotically 

/s 

closer  to  J(u)  than  Ln(u)  for  most  values  of  u. 

2)  In  the  tails,  where  good  estimation  is  vital  for  robustness, 

the  Hajek  interval  (X,,  •«  »  X.^  will  have  >  close  to 

one  of  the  two  end  points,  since  it  will  take  a  longer  interval  to 

find  c  order  statistics  in  one  direction  than  in  the  other.  The 
n 

corresponding  interval  for  the  proposed  estimate,  i.e., 

(X(P(j)-n(2))’  X(P(j)4n(l)))  ’ 

is  symmetric  about  For  small  samples  this  should  be  an 

advantage  for  the  proposed  estimate. 

3)  The  estimated  coefficients  of  Hajek's  test  and  the  test  pro¬ 
posed  here  both  involve  estimates  of  the  derivative  of  the  density  at 
a  point  u.  In  both  approaches  this  is  accomplished  by  taking  the 
difference  of  estimates  of  the  density  at  the  end  points  of  an  inter¬ 
val  containing  u.  The  interval  used  for  the  determination  of  the 
proposed  coefficients  is  much  wider  than  that  used  in  Hajek's  method. 
This  may  contribute  to  a  more  stable  estimate  for  small  sample  sizes. 

4)  Finally,  and  most  importantly,  Hajek's  estimate  is  based  on 

r  data  points,  where  r  /n  0.  The  estimate  here  is  based  on  the 
n  n 

total  sample. 

In  summary,  the  two  major  differences  are: 

1)  The  proposed  test  sacrifices  some  asymptotic  efficiency  for 
small  sample  performance. 
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2)  The  proposed  test  uses  the  entire  sample  instead  of  a 
vanishingly  small  fraction  to  estimate  J(u)  and  should 
therefore  be  more  stable. 

2.5.  Convergence  Properties 

Now  that  the  estimate  has  been  defined,  we  need  to  examine 
its  convergence  properties.  We  will  show  that  the  above  adaptive 
rank  test  coefficients  converge  to  Gastwirth's  asymptotically  most 
powerful  group  rank  test  (AMPGRT)  coefficients  for  a  wide  class  of 
distributions . 

We  will  need  the  following  regularity  assumptions: 

1)  F  has  a  continuous  density  f  >  0  for  all  x. 

2)  f  is  bounded. 

3)  f'  exists  and  is  bounded  for  all  x. 

4)  0  <  /(f'/f)2  fdx  =  1(f)  <  ». 

/v 

Let  f  be  any  location  equivariant  density  function  estimate  which 
converges  strongly,  uniformly  to  f  (e.g.,  the  nearest  neighbor 
density  estimate,  Devroye  and  Wagner  [6]).  Let  X15...,X  be  i.i.d. 
~  F(x) .  Let  F  ^  =  G  be  defined  as  before. 

Note:  Since  we  are  assuming  f*  exists  and  is  bounded  for  all  x,  we 
have  as  a  by-product  that  f  is  Lipschitz  of  order  one  and  therefore 
absolutely  continuous,  and  therefore  uniformly  continuous. 

LEMMA  1.  Under  the  additional  assumption  that  k(n)  >  /n  log  n  we 
have 
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where 


[Ln(ii)  -  L(u)  ]  ->■  0  w.p.  1  , 

L(u)  =  [f(G(X1_1))-f(G(Xi))]/(X1-X1_1)  . 

PROOF.  In  the  following  we  will  often  use  the  elementary  fact  that 
if  as  n  +  “> 

X  ->  X  w.p.  1 
n 

and 

Yn  +  Y  w.p.  1  , 

then 

X .  +  Y  X  +  Y  w.p.  1 
n  n  r 

It  is  well  known  that  convergence  w.p.  1  is  preserved  by  continuous 
functions,  that  is,  if  h(u)  is  continuous  and  X  •*  X  w.p.  1,  then 
h(Xn)  -*■  h(X)  w.p.  1.  Devroye  and  Wagner  [6]  show  that  if  the  two 
conditions 

(2.5.1)  D^(n)  (u)  -*■  0  w.p.  1  as  n  ->  05  , 

where  is  as  defined  in  Section  2.4,  and 

2 

(2.5.2)  n(D^n^(u)  n  00  w.p.  1  as  n  -*■  00  , 

are  satisfied,  then 

/s. 

(2.5.3)  sup[f (u)  -  f (u) ]  0  w.p.  1  as  n  -*■  00 

u 
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From  Theorem  1  of  Moore  and  Yackel  [21],  from  a  result  of  Kiefer 
[18],  we  have  that  if  k(n)/log  log  n  -*■  °°,  then 

(2.5.4)  K(n)/(nH(Dk(n)(u)))  1  w.p.  1  , 

where 


H<Dk(n)(u>)  ‘  f  • 

S(Dk(n) <u)) 

where  s(Dk(n) (u))  =  interval  of  length  2Dk^  centered  at  u.  From 

(2.5.4)  we  have  that  H(Dk^(u))  +  0  w.p.  1.  Since  f(x)  >  0  for  all 
X,  we  clearly  also  have  ^k(n) (u)  ->  0  w.p.  1.  From  the  same  work  we 
also  have  that 


(2.5.5)  2Dk(n)<„) 


X«S<W"» 


f(x)  <  H(D,  ,  .  (u)> 


<2Dkfn)^U^  SUp  f (x)  - 

Also  by  (2.5.4)  and  (2.5.5)  we  have  that  nDk^  (u)  /k(n)  -> 

2  2  2 

1  w.p.  1.  Therefore,  n  (u)  /K(n)  -*  1  w.p.  1.  Thus  to  satisfy 

2  2 

(2.5.2)  above  we  need  n  /k(n)  to  be  <  n/log  n.  This  is  equivalent 
to  k(n)  >  Vn  log  n.  Therefore,  we  have  shown  that  f(u)  converges  to 
f(u)  w.p.  1  uniformly. 

Now  consider  f (G(u))  -  f (G(u)) .  This  is  equal  to 
f(G(u))  -f(G(u))  +f(G(u))  -f(G(u)).  We  will  show  that  each  of  these 


two  differences  converge  to  0  w.p.  1.  The  first  difference  converges 
to  0  w.p.  1  by  the  strong  uniform  convergence  of  nearest  neighbor 
density  estimators  demonstrated  above.  The  fact  that  G(u)  -  G(u) 
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converges  to  0  w.p.  1  is  stated  previously.  Then  since  convergence 
w.p.  1  is  preserved  by  continuous  functions,  we  have  that 

A 

f (G(u))  -  f (G(u))  converges  to  0  w.p.  1.  Dividing  by  X^  -  X^  ^  does 
not  change  the  convergence  and  the  desired  result  follows.  Q.E.D. 

A 

COROLLARY.  L  (u)  -»■  L(u)  in  probability  as  n  -»■  00 . 

2.6.  Formation  of  the  Test  Statistic 

Our  final  goal  is  to  use  the  coefficients  developed  above,  to 
create  a  test  statistic.  This  statistic  will  have  the  basic  form 

N  „ 

E  L  (i/N+l)co.  . 

.  ,  n  l 

i=l 

It  is  necessary  to  determine  the  distributional  properties  of  this 
random  variable  in  order  to  use  it  in  testing  situations.  The  sta¬ 
tistic  may  be  used  conditionally  on  the  estimated  coefficients  to 
produce  a  level  a  test  for  finite  n.  However,  we  are  going  to  pro¬ 
pose  a  slightly  different  procedure. 

We  would  like  to  normalize  the  test  statistic  in  order  to 
provide  a  large  sample  version  of  the  statistic  which  can  exploit  the 
approximate  normality.  That  is,  we  wish  to  modify  the  statistic  to 
have  mean  zero  and  asymptotic  variance  one  under  the  null  hypothesis 
of  no  change  in  location.  Then  in  order  to  determine  the  level  a 
critical  value  for  the  test,  we  will  generate  a  large  number  of  repe¬ 
titions  of  the  modified  statistic  and  determine  its  1-a  percentile 
empirically.  The  following  lemmas  are  helpful  in  allowing  us  to 
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substract  the  mean  from  the  above  statistic  producing  a  statistic 
with  mean  0. 


LEMMA  2.  If  the  fractiles  . Ak>  are  symmetric  about  1/2,  that 

is,  if  A^^  =  l-Ak+^_^  and  if  f  is  symmetric  about  0,  that  is, 
f(x)  =  f(-x),  and  finally,  if  c±  =  ck+2_i9  then 

k+1 

Z  c. [f(G(A.  ,))  -  f (G(A.)) ]  -  0  . 

i=l  1  1 

PROOF.  Assume  k  is  odd.  Since  f  is  symmetric,  G(u)  =  -G(l-u) .  Then 
we  have 

Iftwx^l-fwy)]  =  [f(GC1-Xk+1_(1_1)))  -  f CG(l-,\k+1_i))  ] 

ff (-G(/.k+1_(1_1)))  -JC-G(Xt+1_1))  ) 

-  t£(«. W»  -  1  • 

The  result  follows  since  ^  =  ck+2  .  The  case  for  k  even  is 
similar.  Q.E.D. 

Recalling  that  N  =  2n ,  let 

2n  k+1 

L=  2  L(i/SH)/N  =  Z  L(A.)(P(i)  -P(i-l)  +1)/N 
i=l  i=l  1 

and 

_  2n  A  k+1  A 

(2.6.1)  Lf  =  Z  L  (i/N+l)/N  =  Z  L  (A.) (P(i)  -  P(i-l)  + 1)/N  , 
n  .  .  n  .  .  n  i 

i=l  i=l 
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where  again  P(i)  =  [NA^]  +1.  We  note  that  the  results  presented 
here  have  specified  equal  sample  sizes  for  notational  convenience. 
The  derived  results  remain  true  for  the  more  general  situation  under 
simple  conditions,  e.g.,  the  bounding  of  the  limiting  proportion  of 
each  of  the  two  samples  away  from  zero. 

LEMMA  3.  Under  the  conditions  of  Lemma  1  and  Lemma  2,  L^  converges 
in  probability  to  0. 


PROOF .  We  have 


k+1  „ 

(2.6.2)  L  =  Z  L  (A.) (P(i)  -  P(i-l)  -  1)/N 
n  .  i  n  x 


k+1  [f(G(A,  ))  -  f (G(A  .)) ] 

-  *  - BU.-X._p  '  -»  ' 

1  J-  1  1  d- 


We  have 


Pi^i-l-1 

NCA.-A.,,) 


[NA±]  -  CNA1_1  ]  -1 

NA .  -  NA .  1 
x  x-1 


->  1 


as 


n 


Also,  we  have  shown  that  f(G(A^))  converges  in  probability  to 
f(G(A^)).  Therefore,  Ln  given  by  (2.6.2)  converges  in  probability, 
as  n  -»■  00 ,  to 

k+1 

(2.6.3)  Z  [f(G(A,  ))  -f(G(A.))]  , 

i=l 

which  =  0  by  Lemma  2  with  c^  =  1  for  all  i.  Q.E.D. 

Our  test  statistics  now  has  the  modified  form 
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(2.6.4) 


1  N  - 

T  =  Z  CL  (i/N+1)  -  L  >0).  . 

i=l  n  1 

Henceforth,  will  indicate  an  estimate  based  on  a  simple  average  of 
estimates  from  each  of  the  two  samples  of  size  n.  Similarly,  will 
represent  (2.6.1)  with  replaced  by  L^.  To  show  that  the  modified 
statistic  has  mean  zero,  we  need  one  result. 

A 

LEMMA  4.  OL  and  L^j/N+l)  are  pairwise  independent,  for  all  i,j, 
when  there  is  no  difference  in  location  in  the  two  samples. 

PROOF.  We  have 

Prob{LN(u)e(a,b)  ;  0^  =  1}  +  Prob{£N(u)e(a,b)  ;  co±  =  0} 

=  Prob(LN(u)e(a,b)}  . 

We  also  have  under  the  null  hypothesis  that 

Prob(LN(u)e(a,b)  ;  ok  =  l}  =  Prob{LN(u)e(a,b)  ;  ok  =  0)  . 

Therefore, 

Prob{LN(u)e(a,b) ;  0^  =  1}  =  j  Prob{LN(u)e(a,b) } 

=  Prob{aL  = l}  •  Prob(LN(u)e(a,b)}  Q.E.D. 

We  now  show  that  the  test  statistic  given  by  (2.6.4)  does 
indeed  have  mean  0. 
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LEMMA.  5 .  Let 


N 


tn  -  j^y  i/^1)  -  V“± 


Then,  under  the  conditions  of  Lemma  1  and  Lemma  2,  if  there  is  no 

/\ 

difference  in  location  between  the  samples,  E(T  )  =  0. 


PROOF.  By  the  independence  shown  in  Lemma  4, 


N 

E(TN)  =  E  E{(LN(i/N+l)  -Ln)03.} 
i=l 


N 


=  E  E{(I^(i/N+l)  -LN)E(ft).)} 

i_l 


-  (1/2)  E  E{ (L  (i/N+1)  -  L  ) } 
i=l  N  N 

=  0  .  QED. 

Now  that  we  can  assure  ourselves  of  a  statistic  that  has  mean 
0,  we  would  like  to  modify  the  test  statistic  so  that  is  has  asympto¬ 
tic  variance  1.  To  do  this  we  determine  the  asymptotic  standard 
deviation  and  divide  the  test  statistic  by  an  estimate  of  this 
constant.  We  will,  in  fact,  show  that  the  asymptotic  variance  is 

1  P 

(f  ^  [f(G(X1_1))-f(G(Xi))r/(Xi-X1_1)  . 

Therefore,  if  we  can  find  a  function  of  the  data  which  converges  to 
the  above  variance,  we  can  divide  by  the  square  root  of  that  function 
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of  the  data  so  that  the  resulting  test  statistic  will  have  asymptotic 
variance  1. 

LEMMA  6.  Under  the  conditions  of  Lemma  1  and  Lemma  2, 

N  -  o 

^  [L^i/N+1)  -  L^/N 

converges  in  probability  to 
k+1 

J  [fCGa.^))  -f(G(Xi))]2/(Xi-Xi_1)  . 

PROOF. 

N  _ 
z  [LjjCi/N+l)  -I^r/N 

N  _  N  N 

"  { s  [vi/N+1>r-2rN  z  a„(i/N+D)2+  z  c lT  } 2 / n 

i=l  N  N  i=l  N  i=l  N 

N  _ 

=  E  [L^ (i/N+l)]2/N-LN2 

We  have  shown  that  converges  in  probability  to  0.  Also, 

2 

since  x  is  a  continuous  function,  and  continuity  preserves  con- 

_ o 

vergence  in  probability,  we  have  that  1^  converges  to  0  in 
probability.  Therefore,  we  can  concentrate  on  the  quantity 
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(2.6.5)  E  I^Ci/N+l)  /N 


E  {(L  (X  X)  +  L  (X  Y))/2}Z(P(j)-P(j-l)_  1)/N 

j  =  ^  U  J  ^  J  ~ 


E  {(£  (X  X)  +  L  (X.,Y))/2}2([NX.]-[NX.  .]-l)/N 

11  J  ~  n  J  ~  J  J-l  9 


where  Ln(Xj,X)  is  written  to  explicitly  indicate  the  dependence  on 

the  first  sample,  and  similarly  for  £  (X.,Y).  We  have  that  £  (X.,X)2 

n  j  ~  n  j 

/s  2  2 

and  Ln(X . ,Y)  both  converge  in  probability  to  L(X.)  and  therefore 

j  ~  J 

-j  -|_2_  /V.  2  Tr  |  1  O 

1=1  LN^1^  converges  in  probability  to  E  L(X.)  .  For  large  N  we 

can  replace  ( [NX . ]  -  [NX .  -  ]  - 1) /N  by  X .  -  X .  -  so  that  expression 
J  J  3  J 

(2.6.5)  is  asymptotically  equivalent  to  E^  ^(X2)  (X^  -X^). 

Hence,  as  n  -»■  °°,  E^_^  £N(i/N4-l)2/N  converges  in  probability  to 

l(^^)2CA  -X1)  which  equals  E^  [f(G(X  ,)-f (G(X.)  ]2/(X.-X.  ,). 

Q.E.D. 


Thus  a  studentized  version  of  the  original  test  statistic  is 


E  [L  (i/N+1)  -L  ]a)./N2 

i=l  _ iN  1 

i  N  , 

(f){(  ^[£N(i/N+i)  -ln]2)/n)^ 


We  have  shown  that  the  numerator  of  TT  has  mean  zero.  We  will  show 
later  that  T2  has  asymptotic  variance  =  1. 
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2.7.  Location  and  Scale  Equi variance  of  the  Test  Statistic 


Consider  the  test  statistic 


T  = 
N 


N  „ 

E  I^Ci/N+Dau 

[1/4)  E  L..  (i/N+irr* 
i=l  * 


where 


l^Ci/N+l)  = 


[Ln(i/N+l,X)  +  Ln(i/N+1,Y) ] 


and 


A  A  A  A 


Ln(i/N+l,X)  =  [fCGCA^))  -f(G(Xj))]/(Aj  -A^) 


for  A  .  <  i/N+1  <  A  (similarly  for  L  (i/N+l,Y) .  We  have  G  Vlj(u)  = 
J"-1-  J  n  ~  cX+d 

A  ~ 

cGx(u)  +  d  if  c  >  0.  Similarly  with  Y  replacing  X.  Also, 

A  ~  A  A 

^cX+d^CU+d^  =  ^cX^CU^  =  ^00  ^cNDk(N)  ^  =  Similarly  with 

Y  replacing  X.  Thus, 
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=  (l/c)Ln(u,X)  . 

A  /\ 

Similarly  we  have  L  (u,cY+d)  =  (l/c)L  (u,Y) .  Clearly, 

n  n 

co .  (cX+d, cY+d)  =  co.  (X,Y)  for  c  >  0.  The  denominator 
1  ~  ~  1  ~  ~ 

[Cl/4)  L  (i/N+1)^]'2  also  clearly  becomes  multiplied  by  1/c. 

l—  IN 

Therefore,  for  c  >  0,  the  test  statistic  will  not  change.  If  we  were 

/\  _ 

to  change  L^(i/N+1)  by  subtracting  its  mean  1^,  all  the  equivariance 

properties  shown  above  would  remain  since  the  average  of  equivariant 
terms  is  also  equivariant.  Equivariance  under  a  location  shift  will 
be  used  later  to  help  demonstrate  the  asymptotic  properties  of  the 
location  estimate  resulting  from  T^. 
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3.  ASYMPTOTIC  PROPERTIES  OF  PROPOSED  RANK  TEST 

3.1.  Asymptotic  Equivalence  of  the  Test  Statistic  and  the 

Asymptotically  Most  Powerful  Grouped  Rank  Test  Statistic 

Let  us  assume  that  {x}  =  {x. , . . .  ,X  } ,  {y}  =  {Y, , . . . ,Y  } ,  are 

two  samples  of  i.i.d.  observations  from  the  c.d.f.  F.  Let  F  satisfy 

the  following  regularity  conditions : 

1)  F  has  a  continuous  density  f  >  0  for  all  X. 

2)  f'  is  bounded  for  all  x. 

3)  0  <  /(f'/f)2  fdx  =  1(f)  <  inf. 

4)  f  is  symmetric  about  0. 

5)  Finally,  assume  X  =  l-\+1_±>  i  *  1,2, . . .  ,k. 

Let  G(u)  =  F  ^•(u)  =  x([nu]+i)>  an<l  ^-et  f(x)  ^>e  the  k(N)  nearest 

AAA 

neighbor  density  function  estimator.  Let  L^(u)  =  (Ln(u,X)  + Ln(u,Y))/2 

where  L  (u,X)  =  [f(G(X.  .))  -  f(G(X.)) ]/(X  -  X  .)  for  X.  ..  <  u  <  X 
n  ~  i-l  l  l  l— i  l-I  i 

A 

and  similarly  for  Ln(u,Y) .  Let 

(3.1.1)  L*(u)  =  Ln(u)  -1^  , 

- .  A  /S 

where  L^.  is  given  by  (2.6.1)  with  Lr  replaced  by  L^ 


Note:  The  number  of  fractiles  k  should  not  be  confused  with  k(N) , 
the  number  of  nearest  neighbors  used. 

Let 


and 


*  *  k 

T  =  Z  L  (i/N+l)ft)./N2 

i=l  1 
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s  = 


k 

E  L(i/N+l)co./lN  . 
i=l  1 


* 

Thus  T  is  the  proposed  test  and  S  is  the  AMPGRT. 

THEOREM  1.  Assuming  conditions  1)  through  5)  above  hold,  we  have 
* 

that  T  -  S  +  0  in  probability. 


PROOF.  We  have 


(3.1.2) 


k+i  L*a.)  -  La.) 

s  — ^ — — 

i=l 


N‘ 


P(i) 

E  co. 
j=P(i-l)+l  3 


We  first  show  that  the  mean  of  (3.1.2)  is  asymptotically  0  so  that  we 


can  subtract  its  mean  without  changing  its  convergence  properties. 


This  is  equivalent  to  subtracting  .5  from  each  of  the  ok's  in 
equation  (3.1.2).  Thus  we  would  like  to  show  that 

k+i  L*(V  pm  k+i  L(V  pm 

zi-i  -tt  [z3=p(i-i)+i  -jr  <1/2>)  is 

asymptotically  zero.  By  definition  (3.1.1) 


k+i  *  j 

(3.1.3)  E  L  (A  )(P(i)  -  P(i-l)  -  D/2N1  =  0  , 

i=l  1 


since  we  have  subtracted  the  mean  from  the  coefficients.  Therefore, 

k+1  P(i) 

it  remains  to  show  that  A  =  E.  ,  — r —  [E._  \ -  (1/2)]  is  asymp- 

n  1-1  j-r^i-i;+± 

totically  zero. 

We  have  shown  by  Lemma  2  that 

k+1 

E  c.[f(G(A._1))-f(Gai))]  =  0  , 

i_l 
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when  c,  =  c.  .  ,  for  all  i.  Since  A.  =  1  -  A.  .we  have  for 
1  k+2-i  '  1  k+l-i 

each  i 


N(1-Xk+l-i)  -N(1“Xk+2-i)-1 
(NA.-NA  -1)/(A  -A  ) - - V - — 

1  11  1  11  (Ak+2-i  "  Ak+l-i} 


Therefore  the  sequence  of  coefficients 

c.  =  (NA.  -NA.  ,)/2(A.  -  A  .) 

1  x  x-1  l  x-l 

possesses  the  property  that  c.  =  ck+2_j.  fQr  all  ±  and  hence  by 
Lennna  2 , 


k+1 


E  [(NAi-NAi_1-l)/2(Ai-A._1)][f(G(Ai_1))  -  f(G(A±))]  =  0  . 


i=l 


k+1 


% 


The  above  expression  may  be  rewritten  as  LCX^) /2N  , 

which  is  asymptotically  equal  to  A  .  Therefore, 

k+1  [L*(A  )-L(A±)] 

E._.  - i -  (P(i)  -  P(i-l)  -  l)/2  converges  to  0  as  n  + 

x-l  N-s 

Subtracting  this  expression  from  (3.1.2)  we  obtain 


(3.1.4) 


k+1  L  (A.)  -  L(A .) 

ri  1  i 

i=l  N2 


P(i) 

E  co. 

j=P(i-l)+l  1 


k+1  [L  (Ai)-L(Aj.)]  (P(i)_P (i-D-i) 

'  ±-i  ^  2 


k+1  * 

E  [L  (A.)-L(A.)] 
i=l  1  1 


P(i) 

E  (co.- 1/2) 
j=P(i-l)+l  1 


/  N: 
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We  have  shown  that  the  difference  of  (3.1.4)  and  (3.1.2)  converges  to 
0  in  probability.  If  we  can  show  that 

P(i) 

(3.1.5)  E  (oj.  -  1/2)  /N2 

j=P(i-l)+l  3 

•Jj* 

is  bounded  in  probability,  then  since  L  (u)  converges  in  probability 

to  L(u)  we  have  that  (3.1.4)  is  the  sum  of  k+1  random  variables 

which  converge  to  0  in  probability,  and  hence  itself  converges  to  0 

in  probability.  However,  it  is  immediate  that  (3.1.5)  is  bounded  in 

probability  since  it  is  clearly  a  linear  rank  test  statistic. 

Therefore,  by  Theorem  V.1.6.  of  Hajek  and  Sidak  we  have  that  (3.1.5) 

has  bounded  variance.  Hence  by  Chebychev's  inequality,  (3.1.5)  must 

* 

be  bounded  in  probability.  Thus  we  have  shown  that  T  -S  converges 
to  0  in  probability  for  the  wide  class  of  distributions  satisfying 
conditions  1)  through  5).  Q.E.D. 

Now  we  anticipate  a  result  needed  to  utilize  results  of 
Hodges  and  Lehmann  [12].  For  this  purpose  we  introduce  the  following 
framework:  We  consider  the  two  samples  X  and  Y;  but  instead  of  a 
zero  value  for  the  shift,  d,  of  the  Y's  relative  to  the  X's,  we  let  d 
depend  on  the  total  sample  size  N  according  to  d^  =  2b/ [N  1(f)]2.  In 
order  to  use  results  of  Hodges  and  Lehmann  [12],  it  is  necessary  to 
demonstrate  that  the  conclusion  of  Theorem  1  remains  valid  under  this 
model.  That  is,  we  require  the  following: 

COROLLARY.  Assume  that  we  have  the  sequence  of  alternatives 
dN  =  2b/ [N  1(f)]2  as  described  above.  Then  under  assumptions  1) 
through  5)  we  have  that  T  -  S  converges  to  0  in  probability  as  N  00 . 
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PROOF.  By  Theorem  VI. 2. 3,  application  #1,  of  Hajek  and  Sidak  [11], 

we  have  that  (3.1.5)  converges  to  a  distribution  with  finite 

variance.  Therefore,  since  Lemma  2  is  independent  of  the  difference 

in  location  between  the  two  samples,  the  argument  in  the  proof  of 

Theorem  1  remains  valid  and  we  have  that  under  the  sequence  of  alter- 
* 

natives  above,  T  -  S  converges  to  0  in  probability.  Q.E.D. 

* 

To  study  the  asymptotic  distribution  of  the  proposed  test  T 
we  can  therefore  concentrate  on  the  asymptotic  distribution  of  the 
AMPGRT,  i.e., 


S  = 


k+1 

E 

i=l 


L(X±) 


P(i) 

E 


[j=P(i-l)+l 


co. 

J 


5 


since  we  have  now  shown  that  the  difference  between  T  and  S  con¬ 
verges  to  0  in  probability. 


THEOREM  2.  If  conditions  1)  through  5)  hold,  then  T  has  an  asymp¬ 
totic  normal  distribution  with  mean  0  and  variance 

(1/4)  /  L2(u)du 


PROOF.  Clearly  we  have  that  S  has  the  same  asymptotic  distribution 
as 


(3.1.6)  S1 


k+1  NA.-NA._..-1 

^  L(V  P(i)  -  P(i-l)  -  1 


P(i) 

E 


[j=P(i-l)+l 


co. 

J 


* 


since  the  middle  factor  converges  to  1  as  N  goes  to  infinity.  We 
also  have  that  E(S^)  =  0  since 
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P(i)  P(i) 

S  E(co  )  =  Z  (1/2)  =  (P(i)  -  P(i-l)  -  1) /2 
j=P(i-l)+l  2  j=P(i-l)+l 


so  that 


..  k+1 

E(S  )  =  £  L(X.)(NX.-  NX.  ,)/2 

.  -  ii  l-l 
i=l 


k+1 

=  Z  [f (G(X.  ))-  f(G(X.»]N/2 

1=1  1 


=  0  by  Lemma  2  where  c.  =  N/2 

l 


It  is  clear  that  L(u)  is  square  integrable  since  it  is  a 
finite  step  function  on  a  bounded  interval.  Let 

Wi}  =  [f<G<Vi>>  " 


for  P(j-l)  <  ^  ^  <  P(j).  Then  V^(l+  [uN])  and  L(u)  are  the  same 

step  function  except  that  the  change  in  steps  occurs  at  points  whose 
differences  are  converging  to  0.  Therefore  we  clearly  have  that 

lim  £  [VN(1+  [uN])]  -  L(u)]2  du  =  0  . 

This  satisfies  the  conditions  for  Theorem  V.1.6.  in  Hajek  and  Sidak 
[11]  with  their 


c . 

l 


i  =  1,... ,N/2 
i  =  (N/2)  +  1,...,N  . 


Therefore,  we  have  from  this  result  that  S  has  an  asymptotic  normal 
distribution  with  expectation  0  and  variance 
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(1/4)  /0X  L(u)x  du  =  (1/4)  [f(Gai_1)-  f(G(Xi))r/(Ai-  X^)  . 

Q.E.D. 

As  before,  it  is  necessary  to  establish  the  corresponding  results  for 
the  sequence  of  non-zero  alternatives  dN>  To  this  end  we  consider 
the  following: 


COROLLARY.  Given  the  sequence  of  alternatives  dN  =  2b/ [N  1(f)  f*t 
then  if  conditions  1)  through  5)  are  satisfied,  we  have  that  T*  has 
an  asymptotic  normal  distribution  with 


mean  =  b/[2  I(f)^]  £ 


-L(u) 


f(G(u)) 


du 


and 

variance  =  (1/4)  L2(u) 


PROOF.  By  Theorem  VI. 2. 3.,  application  #1,  of  Hajek  and  Sidak  [11], 
the  desired  result  follows. 


3.2.  Asymptotic  Relative  Efficiency 

We  will  now  demonstrate  that  as  the  number  of  fractiles  is 
increased  to  infinity,  the  efficiency  of 

N 

(3.2.1)  T*  =  E  L*(i/N+l)(o./N^ 

i=l  1 


compared  to  the  LMPRT  converges  to  1. 
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THEOREM  3.  Under  conditions  1)  through  5)  of  Section  3.1,  as 
max[Xi~Xi_^]  goes  to  0,  the  asymptotic  relative  efficiency  of  the 

•j5* 

adaptive  test  T  given  by  (3.2.1)  approaches  1. 


PROOF.  We  have  shown  that 


* 

T 


k+1 

E 

i=l 


& 

L  (X.) 


P(i) 

E 

[j=P(i-l) 


co. 

i 


has  the  same  asymptotic  distribution  properties  under  the  null  hypo¬ 
theses  and  under  the  sequence  of  alternatives  d^  given  by  (3.1.9)  as 
the  AMPGRT 


S  -  E  L(i/N+l)co./N2  . 
i=k  1 


The  asymptotic  relative  efficiency  of  T  is  therefore  the  same  as  that 
of  S,  which,  by  Gastwirth  [8],  is 


k(m)+l 

1(f)  E  [f(G(X._1(m)))  -f(G(Xi(m)))]Z/(Xi(m)  -X^m)) 
i=l 


Now  assume  that  we  have  sequences  (X^(m)}  where 

m  =  max{  [X_^(m)  -X^_^(m)]}.  The  number  of  elements  of  each  partition 
is  then  k(m)  +  1.  We  want  to  show  that 


k(m)+l  _ 

Kf)  2  [f(G(X(m)i_1))  -  f(G(X(m)i))]Z/(X(m)i-X(m)._1)  -  1 

i=l  ~  x 

as  m  0 


Since  we  have  the  existence  of  f ' ,  we  know  from  the  mean  value 
theorem  that 
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f(G(Ai(m>)  -  fCGa^Cm))  =  [X^m) ■- Xi_1(m)  ]f '  (G(U1))/f  (G(U±)  ) 

where  X.  - (m)  <  U.  <  X.(m).  Therefore, 
l-l  11  ’ 

k(m)+l 

Z  [f (G(X  (m)))  -  f  (G(X  (m))) ]Z/(X,  (m)  -  X,  , (m)) 

i—l  jL—J-  1  1  1 — J- 

k(m)+l  „ 

=  Z  {[Xi(m)  -  Xi_1(m)  ]  /  [X^m)  -  X^Cm)  ]} 

i=l 

•  [f,(G(U.))/f(G(U.))]2 
k(m)+l 

=  Z  [Xi(m)-  Xi_1(m)][f,(G(Ui))/f(G(U.))]/  , 

i_l 

which  is  the  Riemann  approximating  sum  for  1(f)  as  m  goes  to  0 
Q.E.D. 

It  is  important  to  point  out  that  even  though  the  A.R.E.  ■+•  1, 
the  rate  is  a  function  of  the  convergence  of  the  Riemann  sum  to  the 
limiting  integral,  which  will  depend  on  the  smoothness  of 
f ’ (G(u)) /f (G(u)) .  However,  given  a  finite  set  of  distributions, 
e.g.,  normal,  Cauchy,  logistic,  and  Laplace,  we  can  guarantee  that  we 
will  be  £  away  from  1  in  A.R.E.  for  all  the  distributions  under  con¬ 
sideration,  for  any  £  >  0,  by  choosing  the  finest  partition  needed 
for  any  member  in  that  set. 

We  are  sacrificing  some  degree  of  asymptotic  optimality  in 
order  to  achieve  better  small  sample  performance.  We  note  that 
W.  Albers  [1],  in  a  recent  article  in  the  Annals  of  Statistics 
believes  that  this  is  inevitable  for  applied  work  in  this  area. 
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Actually,  the  score  functions  of  the  distributions  that  are  typically 
used  in  robustness  investigations  are  very  smooth  functions  and  in  an 
important  sense  we  are  not  giving  up  much  in  terms  of  asymptotic 
optimality.  As  we  see  below,  the  asymptotic  relative  efficiencies, 
for  k  very  small,  are  quite  close  to  1.  We  conjecture  that  for  dis¬ 
tributions  that  satisfy  conditions  1)  through  5)  of  Section  3.1  there 
exists  a  sequence  of  fractiles  depending  on  N  that  will  enable  the 
procedure  described  here  to  have  full  asymptotic  efficiency. 

J.  W.  Tukey  in  a  private  communication  mentioned  that  spacings 

-1/3 

between  fractiles  of  the  order  of  N  might  work. 

In  order  to  determine  how  much  efficiency  is  being  sacrificed 
it  is  necessary  to  first  compute  the  asymptotically  most  powerful 
grouped  rank  tests  explicitly  for  these  distributions  of  interest. 


Here  we  will  only  consider  k  odd  and  =  1/2  and  concentrate  on 


the  four  distributions  mentioned  above. 


For  the  Laplace  distribution,  since  the  score  function  of  the 
asymptotically  most  powerful  rank  test  is  sign(u-l/2),  we  have  that 
the  LMPRT  is  also  the  AMPGRT,  since  all  the  ok  for  0<i/N+l<l/2 
have  the  same  coefficient,  and  similarly  for  with  1/2  <i/N  +  l<l. 
Therefore,  the  A.R.E.  for  all  k  is  equal  to  1.  For  the  three  other 
distributions  more  work  is  needed.  For  the  logistic  distribution  we 
have 


[f  (G(X_l_1))  -fCGU.))]/^.- A.^) 


“  [Ai-i(1  -  x±-±>  -  V1  -  V ]  ai  -  xi-i} 


34 


For  the  Cauchy  distribution  we  have 


[f (G(Ai_1))  -  f(G(A.))  ]/(A.  -  Ai_1) 

=  1/tt  [cos2(tt(A1_1  -1/2))  -  cos2(7r(Ai-  1/2))  ]/(A  -  X  }  . 

For  the  normal  distribution  we  have 
[f(G(A±1))  -  f (G(A.)) ]/(A.  -  Ai_1) 

=  (1/2  TT)h  [exp(-l/2($"1(A._1))2)-  exp(-l/2($~1(Ai))2)]/(Ai-Ai_1) 

where  $  ^  represents  the  inverse  normal  distribution. 

For  k  =  3  we  have  for  regularly  spaced  A^: 

—  0,  ^  ~  *25,  A2  —  .50,  A^  “  .75,  A^  =  1.00  . 

For  these  A.^  the  coefficients  for  the  logistic  AMPGRT  are 

L(.25)  =  -  .  75,  L( .5)  =  -  .25,  L(.75)  =  .25,  L(l)  =  .75  . 

The  asymptotic  variance  is  .3125.  The  efficiency  is  .9375. 

The  coefficients  for  the  Cauchy  AMPGRT  are 

L(.25  =  -  .636,  L( .5)  =  -  .636,  L(.75)  =  .636,  L(l)  =  .636  . 

The  asymptotic  variance  is  .404.  The  efficiency  is  .8080. 

The  coefficients  for  the  normal  AMPGRT  are 

L( .25)  =  -  1.27,  L( .5)  =  -  .325,  L(.75)  =  .325,  L(l)  =  1.27  . 

The  asymptotic  variance  is  .8593.  The  efficiency  is  also  .8593. 
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For  k  =  7  we  have  for  regularly  spaced  X^; 


•XQ  =  0,  Xx  =  .125,  X£  =  .250,  X3  =  .375,  X4  =  .50, 

X5  =  .625,  Xg  =  .750,  X?  =  .875,  Xg  =  1.00  . 

The  coefficients  for  the  logistic  AMPGRT  are 

L( . 125)  =  -.875,  L( .25)  =  -.625,  L(.375)  =  -.375,  L(.500)  =  -.125, 

L( . 625)  =  .125,  L( . 75)  =  .375,  L(.875)  =  .625,  L(1.00)  =  .875. 

The  asymptotic  variance  is  .3282.  The  efficiency  is  .9846. 

The  coefficients  for  the  Cauchy  AMPGRT  are 

L( . 125)  =  -.373,  L( .25)  =  -.899,  L(.375)  =  -.902,  L(.500)  =  -.373, 

L( .625)  =  .373,  L(.75)  =  .902,  L(.875)  =  .899,  L(1.00)  =  .373. 

The  asymptotic  variance  is  .475.  The  efficiency  is  .9500. 

The  coefficients  for  the  normal  AMPGRT  are 

L( . 125)  =  -1.64,  L( .25)  =  -.889,  L(.375)  =  -.496,  L(.500)  =  -.159, 

L( . 625)  =  .159,  L( . 75)  =  .496,  L(.875)  =  .889,  L(1.00)  =  1.64. 

The  asymptotic  variance  is  .9378.  The  efficiency  is  also  .9378. 

Below  the  proposed  test  is  denoted  by  PT  and  its  power  is 
compared  to  that  for  the  LMPRT  on  the  distribution  for  which  that 
test  is  locally  most  powerful,  for  various  distributions.  The  first 
half  of  each  section  below  relates  to  the  determination  of  the  95% 
point  under  the  null  hypothesis  for  each  of  the  two  tests  for  each 
distribution,  in  order  to  guarantee  that  the  tests  have  the  same 
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size.  The  second  half  of  each  section  gives  the  empirical  power  for 
the  two  tests.  The  ratio  of  the  power  is  also  given. 

3.3.  Monte  Carlo  Investigation  of  Adaptive  Testing 

~-l 

In  the  following  table  F  denotes  the  percentile  for  that 
particular  distribution  under  study.  PT  denotes  the  proposed  test. 
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TABLE  1 


RESULTS  OF  SIMULATIONS  OF  RANK  TESTS 


Fractiles  =  (.00,  .25,  .50,  .75,  1.0} 
Sample  Size  =  20  For  Each  Sample 


Normal 

—^C  exp(-(l/2)  (x-d)2) 

LMPRT  =  Normal 

Scores  Test 

(2tt)  2 

15,000 

repetitions 

d  =  0 

F_1  (.95)  of  LMPRT 

= 

1.62594 

F-1  (.95)  of  PT 

= 

1.81353 

15,000 

repetitions 

d  =  .40 

Pr (LMPRT  >1.62594} 

= 

.3455 

Pr{PT  >  1.81353} 

= 

.2936 

Ratio  =  .8497 

Cauchy 

j:  (l/(l+(x-d) 

2) 

15,000 

repetitions 

d  =  0 

F-1  (.95)  of  LMPRT 

= 

1.65900 

F-1  (.95)  of  PT 

= 

1.78258 

15,000 

repetitions 

d  -  .75 

PrCLMPRT  >1.64900} 

= 

.4481 

Pr{PT  >1.78258} 

= 

.3845 

Ratio  =  .8582 

Laplace  exp  (-  [x-d  ]) 

LMPRT  =  Median  Test 

15,000 

repetitions 

d  =  0 

F-1  (.95)  of  LMPRT 

= 

1.700  (exact) 

F-1  (.95)  of  PT 

= 

1.74412 

15,000 

repetitions 

d  =  .75 

Pr (LMPRT > 1.62594} 

= 

.6484 

Pr{PT  >  1.81353} 

= 

.6135 

Ratio  =  .9462 

„  .  . .  exp(-(x-d)) 

Logistic  - : - - — - y 

LMPRT  =  Wilcoxon  ' 

Two-Sample  Test 

(l+exp (-(x- 

■d)  )  ) 

15,000 

repetitions 

d  =  0 

-y 

F-1  (.95)  of  LMPRT 

= 

1.66006 

F-1  (.95)  of  PT 

= 

1.79796 

15,000 

repetitions 

d  =  .40 

-y 

Pr (LMPRT  >1.62594} 

= 

.3694 

Pr(PT  >  1.81353} 

= 

.3402 

Ratio  =  .9210 
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4.  FORMATION  OF  ADAPTIVE  R-ESTIMATE 
AND  RESULTING  PROPERTIES 

4.1.  The  Derived  Two-Sample  Location  Estimate 

Now  that  we  have  constructed  a  two-sample  test  for  change  in 
location  which  has  some  adaptive  properties,  we  investigate  the 
transfer  of  these  properties  to  the  corresponding  estimation  problem. 
To  illustrate  the  connection  between  non-parametric  testing  and 
estimation,  consider  the  Wilcoxon  two-sample  rank  sum  test  statistic 
which  can  be  written  as 

N 

W(X,Y)  =  E  (i/N+l)co. 

i=l  1 

where  N  =  2n  and  n  is  the  number  of  X's  and  Y's.  Now  let  to.(e) 

~  ~  1 

denote  the  vector  of  indicator  functions  which  would  result  from  a 
shift  of  e  in  the  second  sample  so  that 

N 

W(X,Y4e)  =  E  (i/N+l)w.(e)  . 

i=l  1 

Let  wN  =  infie  :  W(X,Y+e)  >  0},  wN  =  sup{e  :  W(X,Y+e)  <  0}.  Then 
Wjj  =  (wN  +  w^  )/2  is  known  as  the  Hodges -Lehmann  estimator  [12]  in  the 
two-sample  case.  This  estimator  should  be  thought  of  as  that  shift  in 
the  second  sample  which  causes  the  test  statistic  for  change  in  loca¬ 
tion  to  assume  its  smallest  absolute  value.  This  procedure  can,  of 
course,  be  used  in  almost  any  location  testing  framework  in  order  to 
derive  a  point  estimate  of  location  from  a  test  statistic.  In 
particular,  we  can  use  the  above  procedure  for  the  two-sample  test 
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proposed  here.  A  point  estimate  of  location  based  on  a  rank  test 
will  be  called  an  R-estimate.  An  analogous  procedure  may  be  used  for 
the  one-sample  case.  For  this  case,  in  order  to  use  the  theory  out¬ 
lined  above,  which  makes  particular  use  of  the  two-sample  framework, 
two  samples  must  be  artificially  created  from  the  single  sample.  We 
will  concentrate  on  the  method  for  accomplishing  this  below. 

Suppose  we  have  the  sample  X  =  x^,x2 , . . . >xn*  '^ie  second 
sample  then  consists  of  the  negative  values  of  the  first  sample. 

Then  Oi.(e)  is  the  indicator  of  whether  the  i-th  order  statistic  of 
1 

the  combined  sample  of  {x^-e,  x2  -  e,  . . .  ,xn  -  e}  and 
{-x^+e,  -x2 +e, . . .  ,-xn +e}  comes  from  the  original  sample  or  the 
second  artificial  sample.  This  method  is  more  fully  discussed  in 
Huber  [16],  where  the  influence  curve  and  breakdown  points  for 

R-estimates  constructed  in  this  manner  are  given. 

We  now  proceed  with  the  asymptotic  theory  of  the  proposed 

R-estimate.  Let 


k+1 


(4.1.1) 


T(X,Y)  =  E  L  (A.) 
i=l 


P(i) 

E 


0). 


j=P(i-l)+l  3 


•J* 

where  L  (A^)  previously  defined  by  (3.1.1)  is  again  an  average  of 
estimates  from  the  X  and  Y  samples.  We  denote  this  test  statistic  by 
T(X,Y)  since  it  is  now  important  to  indicate  exactly  how  changes  in  X 
and  Y  affect  the  test. 


In  Van  Eeden  [31]  the  coefficients  of  the  adaptive  rank 
test  were  monotonized  in  the  construction  of  the  R-estimate.  This 
was  done  in  order  to  use  the  results  of  Hodges  and  Lehmann  [12]. 
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The  monotonization  of  the  coefficients  guarantees  that  the  rank  test 
will  be  an  increasing  function  of  e  where  e  is  the  displacement  of 
the  second  sample  from  the  first.  This  property  is  used  to  transfer 
statements  concerning  the  test  statistic  to  ones  involving  the  loca¬ 
tion  estimate  which  is  helpful  in  asymptotic  investigations. 

The  coefficients  of  the  adaptive  rank  test  T  developed  here 
are  not  monotonized.  We  will  show  that  for  the  distributions  under 
which  Van  Eeden's  test  is  asymptotically  fully  efficient,  the  coeffi¬ 
cients  of  the  rank  test  proposed  here  will  be  monotonic  for  all  suf¬ 
ficiently  large  N  with  probability  one. 

By  not  arbitrarily  imposing  monotonicity,  we  gain  an  important 
advantage  with  respect  to  the  estimate  of  Van  Eeden  in  terms  of 
robustness  for  the  following  reasons.  All  long-tailed  distributions 
(e.g.,  the  Cauchy)  have  redescending  score  functions,  and  therefore 
the  LMPRTs  for  long-tailed  distributions  have  non-monotonic  coeffi¬ 
cients.  Considerable  attention  is  given  to  long-tailed  distributions 
in  robust  estimation,  and  therefore  the  monotonization  of  the  coeffi¬ 
cients  should  be  avoided  since  it  impairs  the  performance  of  the 
estimate  for  distributions  of  this  type.  In  particular,  imposed  mon¬ 
otonization  prevents  estimates  such  as  Van  Eeden's  from  having  asymp¬ 
totic  full  efficiency  for  distributions  with  redescending  score 
functions.  The  following  lemma  allows  us  to  use  the  results  of  Hodges 
and  Lehmann  for  the  same  distributions  for  which  Van  Eeden's  estimate 
is  asymptotically  fully  efficient,  while  permitting  some  flexibility 
for  estimation  in  long-tailed  situations. 
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LEMMA  7.  Under  conditions  1)  through  5)  of  Section  3.1,  if  LCA^)  is 
strictly  increasing  with  the  increasing  A^,  then  L  (A^  will  also  be 
strictly  increasing  in  i  for  all  sufficiently  large  N  with  proba¬ 
bility  one. 

/\ 

PROOF.  From  Lemma  1  of  Section  2.5  we  have  that  L(u)  converges  to 
L(u)  w.p.  1.  By  convergence  with  probability  one  we  have  that  for 
all  €  >  0 

(4.1.2)  Prob{  j L^(A^.  )  -  L(A_. )  |  >  €  infinitely  often  (i.o.)}  =  0 

Let  0  <  m  =  min{L(Ai)  -L(Ai_1)},  which  is  greater  than  zero  by 
assumption.  Then 

Prob{£N(A^)  <  L^A^H.o.} 

=  Prob{  [L^Ap  -  L(A.)  ]  +  [L(A±>  -  LCA^)  ]  +  [LCA.^)  -  ^(A^)  ] 

<  0  i.o.} 

<  Prob{[LN(Ai)  -L(A  )  ]  <  -m/2  i.o.} 

+  Prob{ [LN(Ai_1>  -  L(A±_1) ]  >  -m/2  i.o.} 

=  0  .  Q.E.D. 

We  note  that  the  subtraction  of  the  average  L^  clearly  does  not 
affect  monotonicity . 

We  now  explicitly  define  the  proposed  adaptive  estimate  of 
location,  or  adaptive  R-estimate.  We  have  that 


42 


k+1 


P(i) 


Now  define 


T(X,Y-e)  =  Z  L  (X  ){  Z  (co,(-e))] 
i=l  j=P(i-l)+l  1 


eN  =  sup{e  :  T(X,Y-e)  > 0}  , 


** 


eN  =  inf {e  :  T(X,Y-e)  <  0}  , 


and  finally  let 


(4.1.3) 


^  *  ** 
eN  =  eN  +  eN  11 


We  shall  only  consider  the  asymptotic  theory  for  the 

A 

estimator  e^  in  the  genuine  two-sample  case  since  the  theory  trans¬ 
fers  immediately  to  the  one-sample  case. 

We  now  give  some  results  that  follow  immediately  from  the 
work  of  Hodges  and  Lehmann,  and  Van  Eeden.  The  determination  of  the 

A 

asymptotic  distribution  of  e^  will  follow  directly  from  these  results. 

A 

LEMMA.  8.  For  N  sufficiently  large,  T(X,Y  +  e)  is  a  non-decreasing 
function  of  e  for  all  X  and  Y,  provided  L(A^)  is  an  increasing  func¬ 
tion  of  i. 


PROOF.  By  Lemma  7,  with  probability  one  there  will  be  some  N^  such 
that  for  all  N  >  NQ,  L  (A^  <  L  i«0,l,...,k.  Without  loss  of 

generality  we  can  assume  that  e  >  0.  Let  (co^(e)}  denote  the 
w-vector  created  by  adding  e  to  every  Y..  .  Then  (oL(e)}  will  have  the 
l’s  in  the  same  or  higher  positions  than  the  1's  in  {&h(0)},  since 
the  second  sample  will  be  shifted  in  the  positive  direction.  Hence 
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T(X,  Y  +e)  is  the  sum  of  the  same  or  larger  L  (A^) 's  than  those  con- 
tributing  to  T(X,Y)  .  Therefore  T(X,Y  +  e)  is  a  non-decreasing  func¬ 
tion  of  e  for  all  X  and  Y  with  probability  one  for  all  sufficiently 
large  N.  Q.E.D. 

LEMMA.  9.  For  T  given  by  (4.1.1) 


T(X,Y)  +  T(Y,X)  =  0  . 


PROOF.  By  (4.1.1), 


T(X,Y)  +  T(Y,X)  = 


k+1 

E 

i=l 


•k 

L  (A.) 


P(i) 

E 

j-P(i-l)+l 


k+1 
+  E 
i=l 


* 

L  (A.) 


P(i) 

E 

(j»P(i-l)+l 


1-co. 


k+!  * 

=  2  L  (A.)(P(i)  -P(i-l)  -  1) 

i=l 

=0  by  (3.1.1)  , 

* 

since  0)^(X,Y)  =  1  -  ol(Y,X),  and  L  (A^)  is  a  symmetric  function  of 
the  X  and  Y  samples.  Q.E.D. 

LEMMA  10.  Under  the  assumption  that  L(A_p  is  increasing  in  i,  the 

/\ 

distribution  of  e^  is  (absolutely)  continuous  provided  F  is 
(absolutely)  continuous. 

PROOF.  The  result  follows  directly  from  Theorem  1  of  Hodges  and 
Lehmann  [12].  Q.E.D. 
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LEMMA.  11.  Under  the  assumption  that  L(A^)  is  an  increasing  function 

A 

of  i,  the  distribution  of  the  estimate  e^  is  symmetric  about  e,  the 
difference  in  location  between  X  and  Y. 

PROOF.  By  Lemma  9,  T(X,Y)  +  T(X,Y)  =  0.  This  is  condition  (3.3)  in 
Hodges  and  Lehmann  [12]  for  ]i  =  0.  We  showed  in  Section  2.7  that 

T(X  +  e ,  Y  +e)  =  T(X,Y)  . 

A 

Therefore,  we  have  that  e^  is  symmetric  about  e  by  Theorem  2  of 
Hodges  and  Lehmann  [12],  Q.E.D. 

LEMMA  12.  Under  the  assumption  that  L(l^)  is  increasing  in  i,  we 
have  for  any  real  number  e  >  0 

Prob{T(X,Y-e)  <  0>  <  Prob{e.T<e}  <  Prob{T(X,Y-e)  <  0}  . 

PROOF.  This  result  follows  directly  from  Lemma  4  of  Hodges  and 
Lehmann  [12],  Q.E.D. 

We  are  now  ready  to  give  the  asymptotic  distribution  of  the 
proposed  estimate. 

THEOREM  4.  Under  conditions  1)  through  5)  of  Section  3.1,  assuming 
the  L(A_)  are  monotone,  we  have 

N^(eN-e) 

converges  in  distribution  to  N(0,V)  where 
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4  L1  L(u)2  du 

V - i— - - 5-  , 

(/q  L(u)  J(u)du} 

where  J(u)  is  given  by  -f'(F  ‘'"(u))/f(F  "'"(u)). 

PROOF.  Because  of  equivariance,  we  may  let  e,  the  difference  in 
location  between  the  two  samples,  equal  0  without  loss  of  generality. 
We  have  shown  in  the  Corollary  to  Theorem  2  that  for  d^.  =  2b/(N  1(f))2 

(4.1.4)  lim  Prob  {(1/a)  [T(X,Y)  -y]  <  u}  =  $(u)  , 

P”  dN 

where  Prob,  {  }  indicates  the  probability  is  evaluated  under  the 
N 

condition  of  a  difference  of  d^  in  the  two  samples,  and 
y  =  (bN^/(2  Kf)^}  /q  L(u) J(u)du  , 

a2  =  (1/4)N  /q  L(u)2  du 

We  have 

P  =  lim  Prob-CN2  e.T  <  t}  =  lim  Prob-iT^Y-Ct/tf2])  <  0} 

N-x»  0  N  u  ~  ~ 

by  Lemmas  10  and  12 .  Now  let 

b  =  t[l(f)l2]/2  . 

Then 

P  -  lim  Prob  {T(X,Y-[2b/[N  <  0} 

P»  U 

=  lim  Prob  (T(X,Y)  £  0} 

N-*»  dN  ~  ~ 
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lim  Prob  {(1/a) [T(X,Y)  -  y]  £  (l/a)y) 
KPx=°  aN  - 


L(u)J(u)du 


[{j1  L(u) 2  dn]h  I(f)h 


=  $ 


(t/2) 


L(u)J(u)du 


[{j1  L(u)2  du]^ 


This  is  equivalent  to  the  statement  that 


% 

N2  e. 


N 


converges  in  distribution  to  N(0,V)  where 


V  = 


4  /J"  L(u)2  du 
L(u)J(u)du}2 


Q.E.D. 


4.2.  Description  of  Monte  Carlo  Procedures 

It  is  important  in  simulations  to  increase  the  effective 
sample  size  of  a  monte  carlo  calculation  by  employing  certain  vari¬ 
ance-reduction  techniques.  We  shall  use  one  of  the  best  known  of 
these  techniques,  which  was  prominantly  featured  in  the  Princeton 
Robustness  Study,  see  Andrews,  et  al.  [2],  This  method  has  come  to 
be  called  the  Princeton  Swindle .  Descriptions  of  this  procedure 
appear  in  Andrews  et  al.  [2]  and  Gross  [9],  and  perhaps  the  most 
detailed  treatment  may  be  found  in  Simon  [26]. 

We  will  now  describe  the  generation  of  random  numbers  from 
the  various  distributions  needed.  To  implement  the  variance 
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reduction  procedure  we  must  always  represent  these  observations  as 
normal  random  variables  divided  by  independent  non-negative  random 
variables.  The  uniform  random  number  generator  used  was  of  the 
usual  modulo  overflow  type.  The  particular  generator  chosen  pro- 

g 

duces  eight  significant  digits  and  proceeds  through  all  10  numbers 
before  repeating.  To  generate  normal  random  numbers,  we  generate  a 
bivariate  normal  pair  using  polar  coordinates  and  transform  to 
Cartesian  coordinates.  This  is  a  frequently  employed  algorithm 
which  conveniently  generates  two  independent  univariate  normals  from 
two  independent  uniforms. 

To  generate  random  numbers  which  have  a  Laplace  distribution 
and  which  have  a  representation  as  a  normal  divided  by  an  independent 
random  variable,  we  follow  Andrews  and  Mallows  [3].  If  1/(2V^) 
has  an  exponential  distribution,  then  a  normal  random  variable 
divided  by  V  will  have  a  Laplace  distribution.  Therefore  take  a 


uniformly  distributed  random  variable  u  and  let  V  =  l//(-21n(u)) . 

To  generate  random  variables  which  have  a  logistic  distribu¬ 
tion,  we  again  follow  Andrews  and  Mallows  [3].  If  K  has  the 
distribution  of  the  Kolmogorov  distance  statistic,  then  if  Z  has  a 

standard  normal  distribution,  Z/(1/2K)  has  a  logistic  distribution. 

2  00  .2 

To  generate  K  we  use  the  fact  that  2K  =  E.  ,  W./i  where  the  W.  are 

i=l  i  i 

independent  exponential  variables. 

The  generation  of  random  variables  with  the  normal  divided 
by  independent  representation  is  straight-forward  for  the  Cauchy, 
slash,  contaminated  slashes,  contaminated  normals,  and  wild  normals. 
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See  Bell  [4],  or  Andrews,  et  al.  [2]  for  discussions  of  the  above 
distributions • 

In  order  to  improve  the  performance  of  the  random  number 
generator  we  make  use  of  a  random  scrambler.  An  array  composed  of 
100  cells  is  used  to  store  a  random  number  until  the  cell  containing 
that  number  is  randomly  addressed. 

In  order  to  estimate  the  variance  of  the  estimates  of 
variance  arrived  at  in  the  simulations,  summary  statistics  were 
printed  out  every  500  repetitions.  The  variance  of  these  summary 
statistics  were  then  used  to  estimate  the  variance  of  the  estimates 
of  variance.  This  is  a  device  commonly  used  in  performing  monte- 
carlo  simulations. 

4.3.  Improving  Small  Sample  Performance 

The  estimate  of  location  described  above  was  modified 

slightly  in  several  ways  in  order  to  improve  small  sample  performance. 

At  the  suggestion  of  John  Tukey  the  coefficients  of  the  rank 

test  were  smoothed  so  that  the  coefficients  of  consecutive  co.'s  were 

x 

forced  to  be  relatively  close  in  value.  It  was  hoped  that  this 

smoothing  would  decrease  the  small  sample  variance  of  the  location 

estimate.  Two  smoothing  procedures  for  the  estimate  of  L(u)  were 

investigated.  L  (u)  is  a  step  function  on  [0,1],  the  steps  occurring 

at  i/N+1,  for  i=  1,...,N.  Define  1^  =  +  Ap/2  for  i=  1,...,N. 

* 

Thus  a  smoothed  version  of  L  (u)  for  1.  <  u  <  1.  ,  0  <  i  <  N,  is 

i  —  l+l  — 

^  & 

defined  as  the  line  connecting  L  (1_^)  and  L  (1_^+^)  •  The  second 
method  was  similar  to  the  first  except  that  the  smoothing  was  only 
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done  for  X^  _<  u  _<  X^.  This  second  method  out -performed  the  first, 
and  both  produced  substantial  improvement  over  the  non-smoothed 
version.  We  note  that  the  smoothed  versions  do  not  converge  to 
Gastwirth’s  asymptotically  most  powerful  grouped  rank  test. 

The  necessary  symmetry  of  the  coefficients  due  to  the  under¬ 
lying  symmetry  of  the  parent  distribution  may  be  exploited  in  the 
estimation  of  these  coefficients.  Thus  the  estimates  of  the  coeffi¬ 
cients  of  CJ.  and  co_  .  may  be  averaged  and  the  average  substituted 
i  2n+l-i 

for  each  of  the  two  estimates.  This  slightly  improves  small-sample 
performance.  This  procedure  may  also  be  regarded  as  a  smoothing 
device . 

/\ 

Smoothing  of  the  estimate  G,  of  G  was  also  investigated. 

The  procedure  which  was  finally  selected  used 

G(u)  =  (1/2)  G(i/N+1)  +  (l/2)G((i+ 1)/(N+1))  , 

for  i/(N+l)  <_  u  <  (i  +  l)/(N+l). 

We  also  implemented  the  normal  kernel  nearest  neighbor  method 
to  smooth  the  estimate  of  f(u).  (See  Moore  and  Yackel  [21].)  This 
yields 

„  n 

f  (u)  =  (i/n  R(n))  E  K[(u- x.)/R(n)  ]  , 

n  i-1  1 

where  K(z)  takes  the  form  of  a  normal  kernel. 

Note:  the  ordinary  nearest  neighbor  estimate  corresponds  to  the 
uniform  kernel 
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K(z)  = 


1  if  [z]  <  1 
0  otherwise 


The  results  of  Devroye  and  Wagner  [6]  concerning  the  convergence 
properties  of  nearest  neighbor  density  estimates  apply  to  a  general 
class  of  kernels  which  includes  both  the  normal  and  uniform  kernels. 

It  was  necessary,  finally,  to  search  empirically  for  combin¬ 
ations  of  parameter  values  for  the  estimator  which  in  some  sense 
maximize  its  performance  on  long  and  short-tailed  distributions.  The 
objective  was  to  simultaneously  achieve  high  efficiency  for  the 
normal,  the  25%  contaminated  normal  (with  the  slash,  denoted  .25  1/U), 
and  the  Cauchy.  This  criterion  is  similar  to  Tukey's  concept  of  tri¬ 
efficiency.  The  efficiency  for  the  normal  was  considered  to  be  of 
greater  importance  than  the  efficiencies  for  the  other  two 
distributions.  Also,  performance  relative  to  the  other  adaptive 
estimators  in  the  literature  was  a  consideration. 

In  our  efforts  to  optimize  performance,  we  needed  to 
determine  how  many  fractiles  to  use,  which  fractiles  to  use,  and  how 
many  nearest  neighbors  to  use  in  the  density  estimate.  We  soon 
determined  that  having  more  than  six  fractiles  led  to  poor 
performance.  The  optimal  number  of  nearest  neighbors  for  a  sample 
of  size  20  was  determined  to  be  11,  and  for  a  sample  of  size  40  the 
best  number  was  19.  These  numbers  were  relatively  insensitive  to  the 
choice  of  the  fractiles.  The  choice  of  the  fractiles  was  much  more 
crucial.  In  the  investigation  of  the  performance  for  long- tailed 
distributions,  it  was  found  that  by  taking  the  fractiles  to  be 
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.30,  .50,  .70,  1.0,  we  could  nearly  equal  the  performance  of  the 
median  for  those  distributions,  but  that  the  efficiency  for  the 
normal  then  fell  below  80%.  It  was  found  that  in  order  to  attain 
80%  efficiency  for  the  normal  the  fractiles  .15,  .50,  .85,  1.0 
sufficed. 

One  of  the  most  unpleasant  features  of  the  computation  of 
R-estimators  is  the  necessity  of  resorting  the  combined  sample 
every  time  a  new  value  is  attempted  in  the  minimization  procedure. 

In  an  attempt  to  mitigate  this  difficulty,  various  one-step  procedures 
using  Taylor  series  were  tried,  as  well  as  certain  extrapolation 
methods.  None  of  these  were  found  to  be  satisfactory. 

Therefore,  the  minimization  was  ultimately  performed  as 
follows.  The  interval  from  the  first  to  the  third  quartile  was 
divided  into  100  small  intervals  by  99  equally  spaced  points.  The 
test  statistic  (4.1.1)  was  evaluated  at  each  of  the  99  points  as 
well  as  at  the  two  quartiles.  The  point  at  which  this  test  statis- 

A 

tic  achieved  its  minimum  absolute  value  was  taken  to  be  e^.  If  this 
value  was  achieved  by  more  than  one  point,  that  point  closest  to  the 

A 

median  was  taken  as  e^.  This  procedure  uses  the  sample  quantiles  in 

A 

such  a  way  as  to  insure  location  and  scale  invariance  for  e^. 

4.4.  Monte  Carlo  Results  for  the  Adaptive  R-Estimate 

We  present  simulation  results  for  the  proposed  one-sample 
R-estimate  and  make  comparisons  with  other  adaptive  estimators  in 
Table  2  below.  The  location  estimates  considered  are  those  proposed 
by  Johns  [17],  Hodges  and  Lehmann  [12],  Stone  [28],  and  Takeuchi  [30]. 
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Cauchy  15K/20K  3.1  (.03)  2.4  3.3  2.5  (.02) 


These  are  designated  JOH,  H-L,  STONE,  and  TAK,  respectively.  The 
column  labeled  "BEST"  lists  the  best  performance  for  each  distribu¬ 
tion  produced  by  any  of  the  65  estimates  studied  by  Andrews,  et  al. 


[2],  The  results  for  the  proposed  estimate  e^  and  the  estimate 
proposed  by  Stone  [28]  were  determined  by  monte  carlo  simulation  - 
The  results  for  the  other  estimates  are  taken  from  Andrews,  et  al. 
[2],  The  first  column  lists  the  distribution.  The  second  column 
lists  the  number  of  repetitions  for  the  simulations  for  the  proposed 
estimate  over  the  number  performed  for  Stone's  estimate.  For  each 
of  the  estimates  considered  the  estimated  variance  times  the  sample 
size  is  given.  Standard  deviations  are  given  for  the  estimates  for 

A 

e  and  Stone's  estimate. 

N 

4.5.  Comparison  of  Results  With  Other  Adaptive  Estimates 

The  study  of  robust  estimation  of  a  location  parameter  has 
concentrated  on  three  classes  of  estimates;  M-,  L-,  and  R-estimates. 
R-estimates  have  been  discussed  earlier  in  the  present  paper.  Huber 
[14]  developed  the  pure  M-estimators  which  are  solutions  to  equations 
of  the  form 

n 

E  i^[x.  -  M]  =  0  , 

i=l 

as  well  as  certain  modifications  introduced  to  ensure  scale 
invariance.  L-estimates  are  linear  combinations  of  order  statistics. 
It  was  of  interest  to  compare  the  estimate  proposed  here  with  the 
Hodges-Lehmann  estimate  since  the  latter  is  the  best  known  R-estimate 
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and  the  only  one  Included  in  the  Princeton  Robustness  Study  (Andrews, 
et  al.  [2 ])  .  It  was  also  appropriate  to  compare  the  proposed 
estimate  with  other  nearly  fully  asymptotically  efficient  adaptive 
procedures.  It  should  be  noted  that  such  procedures  do  not  become 
leading  contenders  in  the  area  of  robust  estimation  of  a  location 
parameter  until  sample  sizes  are  greater  than  40  (see  Hogg  [13 ]) . 

R-estimates  are  inherently  scale  invariant,  a  feature  that  is 
not  shared  by  the  most  thoroughly  studied  family  of  robust  estimates, 
namely,  the  pure  M-estimates.  The  proper  method  for  obtaining  scale 
invariance  is  a  bothersome  question  in  M-estimation.  If  R-estimation 
can  be  shown  to  be  viable,  then  this  inherent  advantage  might 
encourage  investigators  to  return  to  the  study  and  improvement  of 
R-estimators.  On  the  other  hand,  if  the  price  paid  for  this  invari¬ 
ance  is  too  high,  then  the  recent  lack  of  interest  in  this  area  would 
appear  to  be  justified. 

Since  we  are  to  focus  our  attention  on  nearly  fully  asympto¬ 
tically  efficient  procedures,  we  shall  henceforth  call  such  proce¬ 
dures  ultra-adaptive  to  distinguish  them  from  estimates  which  are 
adaptive  for  only  a  specific  parameter  and  make  no  attempt  at  full 
asymptotic  efficiency.  For  instance,  M-estimates  are  all  adaptive 
to  some  degree  as  a  consequence  of  the  behavior  of  the  method  used  to 
obtain  scale  invariance.  In  order  to  determine  which  estimates  to 
compare  with  the  one  proposed  here,  we  need  to  examine  those  M-  and 
L-estimators  which  attempt  to  gain  sufficient  information  about  the 
distribution  to  form  an  asymptotically  optimal  estimate. 
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In  L-estimation,  at  least  two  moderately  successful  attempts 
to  accomplish  this  goal  have  been  made.  Johns  [17],  and  Takeuchi 
[30]  both  developed  estimators  which  estimate  the  coefficients  of 
an  L-estimate  by  methods  which  minimize  an  estimate  of  the  variance. 
Stone  [28],  has  proposed  an  estimator  similar  to  an  M-estimator 
which  involves  an  estimate  of  the  optimal  ip  in  the  relation 

n 

S  ifj[(x  -M)/S]  =  0  , 

i=l 

where  S  is  a  suitably  chosen  scale-invariant  statistic.  None  of  the 
other  estimators  we  examined  which  attempted  to  achieve  at  least 
nearly  full  asymptotic  efficiency  were  supported  by  small  sample 
results. 

Examination  of  the  above  simulations  shows  that  the  adaptive 
R-estimate  developed  here  performs  less  well  than  those  of  Stone, 
Takeuchi,  and  Johns.  The  proposed  estimator  compares  well  with 
Takeuchi  on  20  data  points;  and  on  distributions  with  longer  tails 
than  the  normal,  it  appears  to  dominate  the  Hodges-Lehmann  estimator. 

It  was,  however,  nearly  competitive  which  is  perhaps 
surprising  considering  the  general  opinion  about  the  sample  size 
necessary  for  the  successful  application  of  the  approach  used  here 
(viz.  Huber  [15],  Hogg  [13],  and  Haj ek  [10]).  The  limitation 
of  the  number  of  fractiles  to  be  less  than  five  did  not  allow  us  to 
compete  very  well  on  both  long-tailed  distributions  and  the  normal, 
because  it  is  difficult  to  pick  the  middle  fractiles  so  as  to  give 
good  over-all  performance.  If  these  are  moved  from  .15  and  .85  to 
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.30  and  .70,  then  the  long-tailed  distributions  are  accommodated 
since  I^Cu)  is  then  constant  over  [0,3)  and  (.7,  1.0]  causing  to 
behave  somewhat  like  the  median.  If  we  move  the  middle  fractiles 
toward  0  and  1,  then  the  sharp  rise  in  J(u)  necessary  for  good 
performance  on  short-tailed  distributions  may  occur,  but  at  the 
expense  of  possibly  inaccurate  density  estimation  in  the  tails  of 
long-tailed  distributions.  This  may  cause  the  coefficients  of  the 
rank  test  to  be  too  large  for  u  near  zero  and  one.  This  discussion 
may  seem  pessimistic  with  regard  to  the  possibility  of  efficiently 
using  R-estimates  in  the  problem  of  location  estimation;  but,  in 
fact,  the  performance  of  the  proposed  estimator  is  rather  better  than 
many  investigators  would  have  thought  possible.  It  is  not  unreasona¬ 
ble  to  believe  that  better  smoothing  techniques  or  density  estimation 
procedures  might  further  improve  the  small-sample  variances  of  such 
R-estimators. 

4.6.  Summary 

In  conclusion,  we  recapitulate  the  points  which  have  been 
made  concerning  adaptive  R-estimation.  First,  an  adaptive 
R-estimate  has  been  constructed  which  is  certainly  a  reasonably 
robust  estimate  of  location.  The  proposed  estimate  out-performs  most 
previous  expectations  for  an  estimator  of  this  type,  although  there 
exists  estimators  already  in  the  literature  which  are  clearly  super¬ 
ior  to  the  proposed  R-estimator.  An  explanation  of  this  fact  proba¬ 
bly  lies  in  the  loss  of  information  for  small  samples  that 
accompanies  the  use  of  ranks.  Also,  estimation  in  the  tails  is 
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crucial;  and  on  samples  of  size  20  and  40,  it  seems  that  it  is  not  a 
simple  matter  to  estimate  the  density  sufficiently  accurately  in  the 
tails  for  small  samples. 

It  may  be  that  the  future  of  R-estimation  lies  in  a  different 
type  of  adaptation.  A  possible  approach,  for  example,  would  be  to 
consider  a  family  of  score  functions  indexed  by  a  parameter  p.  One 
could  then  approximate  the  "best"  p  by  minimizing  an  estimate  of  the 
asymptotic  variance  of  the  R-estimate  as  a  function  of  p.  There  is 
some  reason  to  hope  that  the  performance  of  such  an  adaptive  estimate 
could  approach  that  of  the  leading  contenders  in  the  area  of  robust 
estimation  * 
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